1. Package installation

We will use the following packages for this example:

if (F) {
  install.packages(c("akima", "caret", "devtools", "earth", "gam", "ggplot2", "mgcv", "mlbench", "plotmo")) # run lines 16 and 17 manually if needed
  devtools::install_github("ck37/ck37r")
}

library(akima)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(ck37r)
library(devtools)
library(gam)
## Loading required package: splines
## Loading required package: foreach
## Loaded gam 1.14
library(ggplot2)
library(mgcv)
## Loading required package: nlme
## This is mgcv 1.8-17. For overview type 'help("mgcv-package")'.
## 
## Attaching package: 'mgcv'
## The following objects are masked from 'package:gam':
## 
##     gam, gam.control, gam.fit, plot.gam, predict.gam, s,
##     summary.gam
library(mlbench)
library(plotmo)
## Loading required package: plotrix
## Loading required package: TeachingDemos
library(earth)

2. Goals

Use the “PimaIndiansDiabetes2” dataset to construct a generalized additive model (GAM) and multivariate additive regression model (MARS, aka EARTH). blood pressure will be the response variable. Missing data will be median-imputed and indicators will be created to document their missingness.

3. Preprocess the data

# load the dataset
data(PimaIndiansDiabetes2)
?PimaIndiansDiabetes2
data <- PimaIndiansDiabetes2 # give the data a simpler name
str(data)
## 'data.frame':    768 obs. of  9 variables:
##  $ pregnant: num  6 1 8 1 0 5 3 10 2 8 ...
##  $ glucose : num  148 85 183 89 137 116 78 115 197 125 ...
##  $ pressure: num  72 66 64 66 40 74 50 NA 70 96 ...
##  $ triceps : num  35 29 NA 23 35 NA 32 NA 45 NA ...
##  $ insulin : num  NA NA NA 94 168 NA 88 NA 543 NA ...
##  $ mass    : num  33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 NA ...
##  $ pedigree: num  0.627 0.351 0.672 0.167 2.288 ...
##  $ age     : num  50 31 32 21 33 30 26 29 53 54 ...
##  $ diabetes: Factor w/ 2 levels "neg","pos": 2 1 2 1 2 1 2 1 2 2 ...

Check for missing data:

# check for missing cases
sum(is.na(data)) 
## [1] 652
# how much of the data is missing? 
sum(is.na(data)) / (nrow(data)*ncol(data)) # about 9% 
## [1] 0.0943287

Recode the “diabetes” vector to numeric type:

data$diabetes <- ifelse(data$diabetes=="pos", 1, 0)

Use Chris K’s handy median impute function to impute missing values:

# impute and add missingness indicators
result = ck37r::impute_missing_values(data) 

# overwrite "data" with new imputed data frame
data <- result$data 

Double check that missing values have been imputed:

# no more NA values
sum(is.na(data))
## [1] 0
# check that missingness indicators have been added
str(data)
## 'data.frame':    768 obs. of  14 variables:
##  $ pregnant     : num  6 1 8 1 0 5 3 10 2 8 ...
##  $ glucose      : num  148 85 183 89 137 116 78 115 197 125 ...
##  $ pressure     : num  72 66 64 66 40 74 50 72 70 96 ...
##  $ triceps      : num  35 29 29 23 35 29 32 29 45 29 ...
##  $ insulin      : num  125 125 125 94 168 125 88 125 543 125 ...
##  $ mass         : num  33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 32.3 ...
##  $ pedigree     : num  0.627 0.351 0.672 0.167 2.288 ...
##  $ age          : num  50 31 32 21 33 30 26 29 53 54 ...
##  $ diabetes     : num  1 0 1 0 1 0 1 0 1 1 ...
##  $ miss_glucose : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ miss_pressure: num  0 0 0 0 0 0 0 1 0 0 ...
##  $ miss_triceps : num  0 0 1 0 0 1 0 1 0 1 ...
##  $ miss_insulin : num  1 1 1 0 0 1 0 1 0 1 ...
##  $ miss_mass    : num  0 0 0 0 0 0 0 0 0 1 ...

4. Generalized additive models (GAMs)

This semester, MLWG has explored linear, polynomial, and spline regression models using single predictors (March 3) as well as stepwise selection using multiple predictors (Feb 17). Deb also offered an informative take on splines earlier today (Mar 17). Last semester, we talked about improving linear regression models via penalized regression (LASSO and ridge) using multiple predictors (Nov 4).

When considering multilple predictor variables, another extension of multiple linear regression can be used - generalized additive models.

Generalized additive models (GAMs) are another extension of multiple linear regression. They are not bound by linear relationships between predictor and response variable and can instead incorporate smoothed, nonlinear relationships. Each relationships is computed and summed (thus making it “additive”). Smoothed splines are not the only constructs used to build GAMs, as they can be built using natural splines, local regression, polynomial regression, etc.

“Backfitting”, or updating the model as each predictor is approximated using penalized likelihood maximization, comprises the smoothed spline.

See Wood’s book for thorough walkthroughs of GAMs in R.

As always, we also encourage Introduction to Statistical Learning - Chapter 7 for a nice introductory overview and exercises.
See Faraway 2002 for a great intro to regression and ANOVA

Fit the GAM:

gam1 <- gam(pressure ~ s(glucose) + s(insulin) + s(age) + diabetes,
            family="gaussian",
            method="GCV.Cp",
            data=data)

gam1
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## pressure ~ s(glucose) + s(insulin) + s(age) + diabetes
## 
## Estimated degrees of freedom:
## 2.31 1.00 2.41  total = 7.73 
## 
## GCV score: 127.3935
# view summary output
gam.check(gam1)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 12 iterations.
## The RMS GCV score gradient at convergence was 1.109041e-05 .
## The Hessian was positive definite.
## Model rank =  29 / 29 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##               k'   edf k-index p-value
## s(glucose) 9.000 2.313   1.036    0.83
## s(insulin) 9.000 1.000   0.993    0.44
## s(age)     9.000 2.413   0.992    0.42
names(gam1)
##  [1] "coefficients"      "residuals"         "fitted.values"    
##  [4] "family"            "linear.predictors" "deviance"         
##  [7] "null.deviance"     "iter"              "weights"          
## [10] "prior.weights"     "df.null"           "y"                
## [13] "converged"         "sig2"              "edf"              
## [16] "edf1"              "hat"               "R"                
## [19] "boundary"          "sp"                "nsdf"             
## [22] "Ve"                "Vp"                "rV"               
## [25] "mgcv.conv"         "gcv.ubre"          "aic"              
## [28] "rank"              "gcv.ubre.dev"      "scale.estimated"  
## [31] "method"            "smooth"            "formula"          
## [34] "var.summary"       "cmX"               "model"            
## [37] "control"           "terms"             "pred.formula"     
## [40] "pterms"            "assign"            "xlevels"          
## [43] "offset"            "df.residual"       "min.edf"          
## [46] "optimizer"         "call"
gam1$aic 
## [1] 5904.123
gam1$sig2
## [1] 126.1119

Play with some basic plotting features

plot(gam1, se=T, 
         shade=T, col="black", shade.col="gray80", 
         residuals=F,
         pages=1)
title("gam1")

5. Compare the GAM to other similar GAMs!

Our plots suggest that “glucose” is fairly linear. What if we compare gam1 to two other GAMs - one that excludes the predictor glucose, and another that assumes a linear relationship of glucose?

# model that excludes glucose
gam2 <- gam(pressure ~ s(insulin) + s(age) + diabetes,
            family="gaussian",
            method="GCV.Cp",
            data=data)

plot(gam2, pages=1)

# model that assumes linear glucose
gam3 <- gam(pressure ~ glucose + s(insulin) + s(age) + diabetes,
            family="gaussian",
            method="GCV.Cp",
            data=data)

plot(gam3, pages=1)

anova(gam1, gam2, gam3, test="F") # small p-value suggests that a non-linear function for glucose is preferable?
## Analysis of Deviance Table
## 
## Model 1: pressure ~ s(glucose) + s(insulin) + s(age) + diabetes
## Model 2: pressure ~ s(insulin) + s(age) + diabetes
## Model 3: pressure ~ glucose + s(insulin) + s(age) + diabetes
##   Resid. Df Resid. Dev       Df Deviance       F    Pr(>F)    
## 1    759.04      95880                                        
## 2    761.57      98098 -2.52839  -2218.4  6.9572 0.0003364 ***
## 3    760.94      96453  0.62785   1645.4 20.7813 0.0001390 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(gam1, gam2, gam3) # is this a multiple comparison problem? 
##            df      AIC
## gam1 8.725948 5904.123
## gam2 6.594223 5917.426
## gam3 7.430271 5906.107
BIC(gam1, gam2, gam3)
##            df      BIC
## gam1 8.725948 5944.644
## gam2 6.594223 5948.048
## gam3 7.430271 5940.612

What if we want to identify unhelpful predictors and remove them for better results?

table(data$diabetes, I(data$pregnant>14))
##    
##     FALSE TRUE
##   0   500    0
##   1   266    2
gam4 <- gam(pressure ~ s(glucose) + s(insulin) + s(age) + diabetes,
            family="gaussian",
            data=data,
            subset=(diabetes !=0))

plot(gam4, se=TRUE, seWithMean=TRUE, 
         shade=TRUE, col="blue", shade.col="lightgreen", 
         residuals=FALSE,
         pages=1)
title("GAM - adjusted predictors")

AIC(gam1, gam2, gam3, gam4)
##            df      AIC
## gam1 8.725948 5904.123
## gam2 6.594223 5917.426
## gam3 7.430271 5906.107
## gam4 5.777994 2067.410

6. plotmo

The “plotmo” R package offers a great way to visualize regression splines in three dimensions:

plotmo(gam1, all2=TRUE) # show simplfied seWithMean plots AND three dimensional splines for all variable relationships
##  plotmo grid:    diabetes glucose insulin age
##                         0     117     125  29

# non-additive shapes have correlated effects in 3D plane surfaces.

# plot partial dependencies (takes a few minutes)
# plotmo(gam1, all2=TRUE, pmethod = "partdep") 

# faster version of pmethod="partdep"
plotmo(gam1, all2=TRUE, pmethod = "apartdep", 
       caption = "What have I gotten myself in to...") 
## calculating apartdep for diabetes 
## calculating apartdep for glucose 
## calculating apartdep for insulin 
## calculating apartdep for age 
## calculating apartdep for diabetes:glucose 0123456790
## calculating apartdep for diabetes:insulin 0123456790
## calculating apartdep for diabetes:age 0123456790
## calculating apartdep for glucose:insulin 01234567890
## calculating apartdep for glucose:age 01234567890
## calculating apartdep for insulin:age 01234567890

# let's play around with a few more parameters! 
plotmo(gam1, all2=TRUE, pt.col = "green3")
##  plotmo grid:    diabetes glucose insulin age
##                         0     117     125  29

plotmo(gam1, all2=TRUE, pt.col = "green3", smooth.col = "red")
##  plotmo grid:    diabetes glucose insulin age
##                         0     117     125  29

plotmo(gam1, all2=TRUE,  
       pt.col = "green3", 
       smooth.col = "red",
       grid.col="gray80")
##  plotmo grid:    diabetes glucose insulin age
##                         0     117     125  29

# return just some of the plots! 
plotmo(gam1, all2=TRUE, degree1 = c(1,2), degree2=0, col="tomato") # show just the first two predictor plots
##  plotmo grid:    diabetes glucose insulin age
##                         0     117     125  29

plotmo(gam1, all2=TRUE, degree1 = 0, degree2 = 1, # return just glucose v. pregnant perspective plot
       caption = "this is called a 'perspective plot'",
       persp.col="orange")

See Wood S. 2006. Generalized additive models: An introduction with R for expert explanations.

“gam” R package

“mgcv” R package

Also check out Stephen Milborrow’s excellent instructions on the “plotmo” R package

7. Multivariate adaptive regression splines (MARS) and (earth)

Multivariate adaptive regression splines (MARS) are a technique developed by Jerome H. Friedman in 1991 and copyrighted by Salford Systems. Open source implementations are thusly referred to as “earth”, but may not be identical to MARS. Also see the “mda” R package and Friedman papers for specifics.

earth = Enhanced Adaptive Regression Through Hinges

These approaches use “surrogate features” (or, models of the models), usually versions of one or two predictors at a time. Each predictor is divided into two groups and each group models the outcome variable for each group. This creates a “piecewise linear model” where each new feature is some proportion of the data.

Group definitions are provided via linear regression models! Those with the smallest error are used. See Kuhn and Johnson, 2016:145 for more information.

Fit the earth model

# fit the model
set.seed(1)
earth1 <- earth(pressure ~ ., data=data, 
                degree=1, nk=5, 
                keepxy=TRUE, nprune=20, nfold=10, ncross=2,
                pmethod="cv", trace=4) 
## Call: earth(formula=pressure~., data=data, pmethod="cv", keepxy=TRUE,
##             trace=4, degree=1, nprune=20, nfold=10, ncross=2, nk=5)
## === pmethod="cv": Preliminary model with pmethod="backward" ===
## 
## x[768,13]:
##     pregnant glucose triceps insulin mass pedigree age diabetes
## 1          6     148      35     125 33.6    0.627  50        1
## 2          1      85      29     125 26.6    0.351  31        0
## 3          8     183      29     125 23.3    0.672  32        1
## ...        1      89      23      94 28.1    0.167  21        0
## 768        1      93      31     125 30.4    0.315  23        0
##     miss_glucose ...
## 1              0 ...
## 2              0 ...
## 3              0 ...
## ...            0 ...
## 768            0 ...
## 
## y[768,1]:
##     pressure
## 1         72
## 2         66
## 3         64
## ...       66
## 768       70
## 
## Forward pass: minspan 7 endspan 11    x[768,13] 78 kB    bx[768,5] 30 kB
## 
##          GRSq    RSq     DeltaRSq Pred     PredName         Cut  Terms   Par Deg
## 1      0.0000 0.0000                    (Intercept)
## 2      0.1136 0.1229       0.1229    7          age          51  2   3         1 
## 4      0.1785 0.1955      0.07266    5         mass        24.2  4   5         1 final (reached nk 5)
## 
## Reached nk 5
## After forward pass GRSq 0.178 RSq 0.196
## Forward pass complete: 5 terms
## 
## nprune=20
## Subset size        GRSq     RSq  DeltaGRSq nPreds  Terms (col nbr in bx)
##           1      0.0000  0.0000     0.0000      0  1
##           2      0.1153  0.1199     0.1153      1  1 3
## chosen    3      0.1836  0.1921     0.0683      2  1 3 4
##           4      0.1824  0.1952    -0.0012      2  1 3 4 5
##           5      0.1785  0.1955    -0.0039      2  1 2 3 4 5
## 
## Prune method "backward" penalty 2 nprune 20: selected 3 of 5 terms, and 2 of 13 preds
## After pruning pass GRSq 0.184 RSq 0.192
## Full model GRSq 0.184 RSq 0.192, starting cross validation
## CV fold  1.1  CVRSq 0.223  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  1.2  CVRSq 0.219  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  1.3  CVRSq 0.203  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  1.4  CVRSq 0.244  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  1.5  CVRSq 0.118  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  1.6  CVRSq 0.034  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  1.7  CVRSq 0.110  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  1.8  CVRSq 0.182  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  1.9  CVRSq 0.129  n.oof 692 10%  n.infold.nz 692 100%  n.oof.nz  76 100%  
## CV fold  1.10 CVRSq 0.156  n.oof 692 10%  n.infold.nz 692 100%  n.oof.nz  76 100%  
## CV fold  2.1  CVRSq 0.092  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  2.2  CVRSq 0.067  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  2.3  CVRSq 0.226  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  2.4  CVRSq 0.020  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  2.5  CVRSq 0.175  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  2.6  CVRSq 0.287  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  2.7  CVRSq 0.101  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  2.8  CVRSq 0.300  n.oof 691 10%  n.infold.nz 691 100%  n.oof.nz  77 100%  
## CV fold  2.9  CVRSq 0.038  n.oof 692 10%  n.infold.nz 692 100%  n.oof.nz  76 100%  
## CV fold  2.10 CVRSq 0.255  n.oof 692 10%  n.infold.nz 692 100%  n.oof.nz  76 100%  
## CV all        CVRSq 0.159                 n.infold.nz 768 100%  
## 
## === pmethod="cv": Calling update.earth internally for nterms.selected.by.cv=3 ===
## 
## update.earth: using 768 by 14 data saved by keepxy in original call to earth
## Call: earth(formula=pressure~., data=data.frame[768,14], pmethod="cv",
##             keepxy=TRUE, trace=trace, glm=glm, degree=1,
##             nprune=nterms.selected.by.cv, nfold=0, ncross=1,
##             varmod.method="none", nk=5, Object=rv)
## 
## x[768,13]:
##     pregnant glucose triceps insulin mass pedigree age diabetes
## 1          6     148      35     125 33.6    0.627  50        1
## 2          1      85      29     125 26.6    0.351  31        0
## 3          8     183      29     125 23.3    0.672  32        1
## ...        1      89      23      94 28.1    0.167  21        0
## 768        1      93      31     125 30.4    0.315  23        0
##     miss_glucose ...
## 1              0 ...
## 2              0 ...
## 3              0 ...
## ...            0 ...
## 768            0 ...
## 
## y[768,1]:
##     pressure
## 1         72
## 2         66
## 3         64
## ...       66
## 768       70
## 
## Skipped forward pass
## Subset size        GRSq     RSq  DeltaGRSq nPreds  Terms (col nbr in bx)
##           1      0.0000  0.0000     0.0000      0  1
##           2      0.1153  0.1199     0.1153      1  1 3
## chosen    3      0.1836  0.1921     0.0683      2  1 3 4
##           4      0.1824  0.1952    -0.0012      2  1 3 4 5
##           5      0.1785  0.1955    -0.0039      2  1 2 3 4 5
## 
## Prune method "cv" penalty 2: selected 3 of 5 terms, and 2 of 13 preds
## After pruning pass GRSq 0.184 RSq 0.192
# view summary output
summary(earth1, details=TRUE)
## Call: earth(formula=pressure~., data=data, pmethod="cv", keepxy=TRUE,
##             trace=4, degree=1, nprune=20, nfold=10, ncross=2, nk=5)
## 
##              coefficients
## (Intercept)     75.629539
## h(mass-24.2)     0.498705
## h(51-age)       -0.402813
## 
## Number of cases: 768
## Selected 3 of 5 terms, and 2 of 13 predictors using pmethod="cv"
## Termination condition: Reached nk 5
## Importance: age, mass, pregnant-unused, glucose-unused, ...
## Number of terms at each degree of interaction: 1 2 (additive model)
## GRSq 0.1836131  RSq 0.192106  mean.oof.RSq 0.1691089 (sd 0.0797)
## 
## pmethod="backward" would have selected the same model:
##     3 terms 2 preds,  GRSq 0.1836131  RSq 0.192106  mean.oof.RSq 0.1691089
# view predictor importance
evimp(earth1)
##      nsubsets   gcv    rss
## age         2 100.0  100.0
## mass        1  61.0   61.3
# compute predicted values
earth_pred <- predict(earth1)

# print accuracy
(mse <- mean((data$pressure - earth_pred)^2))
## [1] 118.0642

Earth plots

# plot
# png("earth1.png")
plot(earth1)

# dev.off()
plot(earth1, info=T, type="response", trace=1)
## stats::residuals(object=earth.object, type="response")
## stats::fitted(object=earth.object)
## got model response from object$y

## 
## training rsq 0.19
plotmo(earth1, info=T, type="response", trace=1)#, level=.9)
## stats::predict(earth.object, NULL, type="response", info=TRUE)
## Warning: predict.earth ignored argument 'info'
## stats::fitted(object=earth.object)
## got model response from object$y
## 
##  plotmo grid:    pregnant glucose triceps insulin mass pedigree age
##                         3     117      29     125 32.3   0.3725  29
##  diabetes miss_glucose miss_pressure miss_triceps miss_insulin miss_mass
##         0            0             0            0            0         0
## Warning: predict.earth ignored argument 'info'

## Warning: predict.earth ignored argument 'info'

# 3d MARS plots!
# same syntactical rules apply here as well
plotmo(earth1)
##  plotmo grid:    pregnant glucose triceps insulin mass pedigree age
##                         3     117      29     125 32.3   0.3725  29
##  diabetes miss_glucose miss_pressure miss_triceps miss_insulin miss_mass
##         0            0             0            0            0         0
plotmo(earth1, all2=TRUE, persp.col="azure")
##  plotmo grid:    pregnant glucose triceps insulin mass pedigree age
##                         3     117      29     125 32.3   0.3725  29
##  diabetes miss_glucose miss_pressure miss_triceps miss_insulin miss_mass
##         0            0             0            0            0         0

We can also see the ideal number of terms

control <- trainControl(method = "repeatedcv",
                        repeats = 1, number = 1)

grid <- expand.grid(.degree = 1, .nprune = 2:25)

earth_best_terms <- train(pressure ~ ., data = data, method = "earth",
tuneGrid= grid)

earth_best_terms
## Multivariate Adaptive Regression Spline 
## 
## 768 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 768, 768, 768, 768, 768, 768, ... 
## Resampling results across tuning parameters:
## 
##   nprune  RMSE      Rsquared 
##    2      11.35408  0.1073283
##    3      10.89617  0.1793008
##    4      10.93983  0.1760370
##    5      11.02425  0.1692765
##    6      11.04741  0.1662241
##    7      11.12699  0.1582005
##    8      11.14840  0.1586616
##    9      11.18532  0.1575292
##   10      11.23732  0.1538041
##   11      11.26693  0.1516627
##   12      11.29749  0.1494507
##   13      11.30668  0.1499358
##   14      11.38831  0.1487384
##   15      11.42374  0.1469635
##   16      11.44278  0.1454418
##   17      11.44987  0.1448343
##   18      11.44689  0.1451135
##   19      11.44689  0.1451135
##   20      11.44689  0.1451135
##   21      11.44689  0.1451135
##   22      11.44689  0.1451135
##   23      11.44689  0.1451135
##   24      11.44689  0.1451135
##   25      11.44689  0.1451135
## 
## Tuning parameter 'degree' was held constant at a value of 1
## RMSE was used to select the optimal model using  the smallest value.
## The final values used for the model were nprune = 3 and degree = 1.
plot(earth_best_terms)

TODO: - determine best value for nfold - explore the ncross argument - plot cross validation results - collect \(R^2\) in different ways - use cross-validation to select the number of terms - better discuss partial dependence plots - include confidence intervals versus prediction intervals - investigate assumptions of prediction intervals - include text about interpretaiton of 3D plotmo regression surfaces - comprehensively discuss limitations

See Stephen Milborrow’s excellent notes on earth here for lots of handy tips and tricks.

… and view his notes on variance models in earth here

“earth” R package

Friedman 1991 - MARS

Friedman 1993- Fast MARS